18 research outputs found

    SPEAKER NORMALIZATION THROUGH FORMANT-BASED WARPING OF THE FREQUENCY SCALE

    No full text
    Speaker-dependent automatic speech recognition systems are known to outperform speaker-independent systems when enough training data are available to model acoustical variability among speakers. Speaker normalization techniques modify the spectral representation of incoming speech waveforms in an attempt to reduce variability between speakers. Recent successful speaker normalization algorithms have incorporated a speaker-specific frequency warping to the initial signal processing stages. These algorithms, however, do not make extensive use of acoustic features contained in the incoming speech. In this paper we study the possible benefits of the use of acoustic features in speaker normalization algorithms using frequency warping. We study the extent to which the use of such features, including specifically the use of formant frequencies, can improve recognition accuracy and reduce computational complexity for speaker normalization. We examine the characteristics and limitations of several types of feature sets and warping functions as we compare their performance relative to existing algorithms. 1

    “Polyaural ” Array Processing for Automatic Speech Recognition in Degraded Environments

    No full text
    In this paper we present a new method of signal processing for robust speech recognition using multiple microphones. The method, loosely based on the human binaural hearing system, consists of passing the speech signals detected by multiple microphones through bandpass filtering and nonlinear halfwave rectification operations, and then cross-correlating the outputs from each channel within each frequency band. These operations provide rejection of off-axis interfering signals. These operations are repeated (in a non-physiological fashion) for the negative of the signal, and an estimate of the desired signal is obtained by combining the positive and negative outputs. We demonstrate that the use of this approach provides substantially better recognition accuracy than delay-and-sum beamforming using the same sensors for target signals in the presence of additive broadband and speech maskers. Improvements in reverberant environments are tangible but more modest. Index Terms: robust speech recognition, binaural hearing, auditory processing, speech enhancemen

    Multivariate-Gaussian-Based Cepstral Normalization for Robust Speech Recognition

    No full text
    In this paper we introduce a new family of environmental compensation algorithms called Multivariate Gaussian Based Cepstral Normalization (RATZ). RATZ assumes that the effects of unknown noise and filtering on speech features can be compensated by corrections to the mean and variance of components of Gaussian mixtures, and an efficient procedure for estimating the correction factors is provided. The RATZ algorithm can be implemented to work with or without the use of “stereo ” development data that had been simultaneously recorded in the training and testing environments. “Blind ” RATZ partially overcomes the loss of information that would have been provided by stereo training through the use of a more accurate description of how noisy environments affect clean speech. We evaluate the performance of the two RATZ algorithms using the CMU SPHINX-II system on the alphanumeric census database and compare their performance with that of previous environmental-robustness developed at CMU. 1

    BINAURAL AND MULTIPLE-MICROPHONE SIGNAL PROCESSING MOTIVATED BY AUDITORY PERCEPTION

    No full text
    It is well known that binaural processing is very useful for separating incoming sound sources as well as for improving the intelligibility of speech in reverberant environments. This paper describes and compares a number of ways in which the classic model of interaural cross-correlation proposed by Jeffress, quantified by Colburn, and further elaborated by Blauert, Lindemann, and others, can be applied to improving the accuracy of automatic speech recognition systems operating in cluttered, noisy, and reverberant environments. Typical implementations begin with an abstraction of cross-correlation of the incoming signals after nonlinear monaural bandpass processing, but there are many alternative implementation choices that can be considered. These implementations differ in the ways in which an enhanced version of the desired signal is developed using binaural principles, in the extent to which specific processing mechanisms are used to impose suppression motivated by the precedence effect, and in the precise mechanism used to extract interaural time differences. Index Terms – binaural hearing, robust speech recognition, reverberation, auditory models 1

    Discovering location indicators of toponyms from news to improve gazetteer-based geo-referencing

    No full text
    Abstract. This paper presents an approach that identifies Location Indicators related to geographical locations, by analyzing texts of news published in the Web. The goal is to semi-automatically create Gazetteers with the identified relations and then perform geo-referencing of news. Location Indicators include non-geographical entities that are dynamic and may change along the time. The use of news published in the Web is a useful way to discove
    corecore